PeterMac Data Science’s modified version of material by the University of Cambridge (Mark Dunning, Suraj Menon and Aiora Zabala, Robert Stojnić, Laurent Gatto, Rob Foy, John Davey, Dávid Molnár and Ian Roberts, original material: http://cambiotraining.github.io/r-intro) and Software Carpentry.

1. Introduction to R and its environment

What is R?

  • A statistical programming environment suited to high-level data analysis
  • But offers much more than just statistics
  • Open source and cross platform
  • Extensive graphics capabilities
  • Diverse range of add-on packages
  • Active community of developers
  • Thorough documentation

http://www.r-project.org/

R screenshot

R screenshot

New York Times, Jan 2009

New York Times, Jan 2009

R can facilitate Reproducible Research

Sidney Harris - New York Times

Sidney Harris - New York Times

  • Statisticians at MD Anderson tried to reproduce results from a Duke paper and unintentionally unravelled a web of incompetence and skullduggery
    • as reported in the New York Times
New York Times, July 2011

New York Times, July 2011

  • Very entertaining talk from Keith Baggerly in Cambridge, December 2010

According to recent editorials, the reproducibility crisis is still on-going

Nature, May 2016

Nature, May 2016

Reality check on reproducibility

1,500 scientists lift the lid on reproducibility

Getting started

  • In this course we use a server that has everything already installed for you.
  • If you want to install on your own computer you can get the latest release of R from https://www.r-project.org/
    • Base package and Contributed packages (general purpose extras)
    • 14094 available packages as of Wed Apr 17 15:29:48 2019
  • Download from http://mirrors.ebi.ac.uk/CRAN/
  • Windows, Mac and Linux versions available
  • Executed using command line, or a graphical user interface (GUI)
  • On this course, we use the RStudio GUI (www.rstudio.com)
rstudio

rstudio

Introduction to RStudio

(from Software Carpentry)

Throughout this lesson, we’re going to teach you some of the fundamentals of the R language. We’ll be using RStudio: a free, open source R integrated development environment. It provides a built in editor, works on all platforms (including on servers) and provides many advantages such as integration with version control and project management.

Basic layout

When you first open RStudio, you will be greeted by three panels:

  • The interactive R console (entire left)
  • Environment/History (tabbed in upper right)
  • Files/Plots/Packages/Help/Viewer (tabbed in lower right)
RStudio layout

RStudio layout

Once you open files, such as R scripts, an editor panel will also open in the top left.

RStudio layout with .R file open

RStudio layout with .R file open

Workflow within RStudio

There are two main ways one can work within RStudio.

  1. Test and play within the interactive R console then copy code into a .R file to run later.
  • This works well when doing small tests and initially starting off.
  • It quickly becomes laborious
  1. Start writing in an .R file and use RStudio’s short cut keys for the Run command to push the current line, selected lines or modified lines to the interactive R console.
  • This is a great way to start; all your code is saved for later
  • You will be able to run the file you create from within RStudio or using R’s source() function.

We will use the second way in this course.

Tip: Running segments of your code

RStudio offers you great flexibility in running code from within the editor window. There are buttons, menu choices, and keyboard shortcuts. To run the current line, you can 1. click on the Run button above the editor panel, or 2. select “Run Lines” from the “Code” menu, or 3. hit Ctrl+Return in Windows or Linux or +Return on OS X. (This shortcut can also be seen by hovering the mouse over the button). To run a block of code, select it and then Run. If you have modified a line of code within a block of code you have just run, there is no need to reselct the section and Run, you can use the next button along, Re-run the previous region. This will run the previous code block including the modifications you have made.

LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIHRvIFIiCmRhdGU6ICJgciBmb3JtYXQoU3lzLnRpbWUoKSwgJyVkICVCICVZJylgIgpvdXRwdXQ6IAogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKICAgIHRvY19kZXB0aDogMgpzdWJ0aXRsZTogSW50cm8gdG8gUiBhbmQgUlN0dWRpbwotLS0KCipQZXRlck1hYyBEYXRhIFNjaWVuY2UncyBtb2RpZmllZCB2ZXJzaW9uIG9mIG1hdGVyaWFsIGJ5IHRoZSBVbml2ZXJzaXR5IG9mIENhbWJyaWRnZSAoTWFyayBEdW5uaW5nLCBTdXJhaiBNZW5vbiBhbmQgQWlvcmEgWmFiYWxhLCBSb2JlcnQgU3Rvam5pxIcsCiAgTGF1cmVudCBHYXR0bywgUm9iIEZveSwgSm9obiBEYXZleSwgRMOhdmlkIE1vbG7DoXIgYW5kIElhbiBSb2JlcnRzLCBvcmlnaW5hbCBtYXRlcmlhbDogaHR0cDovL2NhbWJpb3RyYWluaW5nLmdpdGh1Yi5pby9yLWludHJvKSBhbmQgU29mdHdhcmUgQ2FycGVudHJ5LioKCiMxLiBJbnRyb2R1Y3Rpb24gdG8gUiBhbmQgaXRzIGVudmlyb25tZW50CgojI1doYXQgaXMgUj8KCiogQSBzdGF0aXN0aWNhbCBwcm9ncmFtbWluZyBlbnZpcm9ubWVudCBzdWl0ZWQgdG8gaGlnaC1sZXZlbCBkYXRhIGFuYWx5c2lzCiogQnV0IG9mZmVycyBtdWNoIG1vcmUgdGhhbiBqdXN0IHN0YXRpc3RpY3MKKiBPcGVuIHNvdXJjZSBhbmQgY3Jvc3MgcGxhdGZvcm0KKiBFeHRlbnNpdmUgZ3JhcGhpY3MgY2FwYWJpbGl0aWVzCiogRGl2ZXJzZSByYW5nZSBvZiBhZGQtb24gcGFja2FnZXMKKiBBY3RpdmUgY29tbXVuaXR5IG9mIGRldmVsb3BlcnMKKiBUaG9yb3VnaCBkb2N1bWVudGF0aW9uCgoKaHR0cDovL3d3dy5yLXByb2plY3Qub3JnLwoKIVtSIHNjcmVlbnNob3RdKGltYWdlcy9SLXByb2plY3QucG5nKQoKIVtOZXcgWW9yayBUaW1lcywgSmFuIDIwMDldKGltYWdlcy9OWVRpbWVzX1JfQXJ0aWNsZS5wbmcpCgoKIyNSIHBsb3R0aW5nIGNhcGFiaWxpdGllcwoKaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL25vdGVzL2ZhY2Vib29rLWVuZ2luZWVyaW5nL3Zpc3VhbGl6aW5nLWZyaWVuZHNoaXBzLzQ2OTcxNjM5ODkxOQohW1IgZmFjZWJvb2tdKGltYWdlcy9mYWNlYm9vay1uZXR3b3JrLnBuZykKCiMjV2hvIHVzZXMgUj8gTm90IGp1c3QgYWNhZGVtaWNzIQoKaHR0cDovL3d3dy5yZXZvbHV0aW9uYW5hbHl0aWNzLmNvbS9jb21wYW5pZXMtdXNpbmctcgoKLSBGYWNlYm9vawogICAgKyBodHRwOi8vYmxvZy5yZXZvbHV0aW9uYW5hbHl0aWNzLmNvbS8yMDEwLzEyL2FuYWx5c2lzLW9mLWZhY2Vib29rLXN0YXR1cy11cGRhdGVzLmh0bWwKLSBHb29nbGUKICAgICsgaHR0cDovL2Jsb2cucmV2b2x1dGlvbmFuYWx5dGljcy5jb20vMjAwOS8wNS9nb29nbGUtdXNpbmctci10by1hbmFseXplLWVmZmVjdGl2ZW5lc3Mtb2YtdHYtYWRzLmh0bWwKLSBNaWNyb3NvZnQKICAgICsgaHR0cDovL2Jsb2cucmV2b2x1dGlvbmFuYWx5dGljcy5jb20vMjAxNC8wNS9taWNyb3NvZnQtdXNlcy1yLWZvci14Ym94LW1hdGNobWFraW5nLmh0bWwKLSBOZXcgWW9yayBUaW1lcwogICAgKyBodHRwOi8vYmxvZy5yZXZvbHV0aW9uYW5hbHl0aWNzLmNvbS8yMDExLzAzL2hvdy10aGUtbmV3LXlvcmstdGltZXMtdXNlcy1yLWZvci1kYXRhLXZpc3VhbGl6YXRpb24uaHRtbAotIEJ1enpmZWVkCiAgICArIGh0dHA6Ly9ibG9nLnJldm9sdXRpb25hbmFseXRpY3MuY29tLzIwMTUvMTIvYnV6emZlZWQtdXNlcy1yLWZvci1kYXRhLWpvdXJuYWxpc20uaHRtbAotIE5ldyBaZWFsYW5kIFRvdXJpc3QgQm9hcmQKICAgICsgaHR0cHM6Ly9tYmllbnouc2hpbnlhcHBzLmlvL3RvdXJpc21fZGFzaGJvYXJkX3Byb2QvCgogICAgCiMjIFIgY2FuIGZhY2lsaXRhdGUgUmVwcm9kdWNpYmxlIFJlc2VhcmNoCgohW1NpZG5leSBIYXJyaXMgLSBOZXcgWW9yayBUaW1lc10oaW1hZ2VzL1NpZG5leUhhcnJpc19NaXJhY2xlV2ViLmpwZykKCgoKCi0gU3RhdGlzdGljaWFucyBhdCBNRCBBbmRlcnNvbiB0cmllZCB0byByZXByb2R1Y2UgcmVzdWx0cyBmcm9tIGEgRHVrZSBwYXBlciBhbmQgdW5pbnRlbnRpb25hbGx5IHVucmF2ZWxsZWQgYSB3ZWIgb2YgaW5jb21wZXRlbmNlIGFuZCBza3VsbGR1Z2dlcnkKICAgICsgYXMgcmVwb3J0ZWQgaW4gdGhlICoqKk5ldyBZb3JrIFRpbWVzKioqCiAgICAKIVtOZXcgWW9yayBUaW1lcywgSnVseSAyMDExXShpbWFnZXMvcmVwLXJlc2VhcmNoLW55dC5wbmcpCgoKCi0gVmVyeSBlbnRlcnRhaW5pbmcgdGFsayBmcm9tIEtlaXRoIEJhZ2dlcmx5IGluIENhbWJyaWRnZSwgRGVjZW1iZXIgMjAxMAoKPGlmcmFtZSB3aWR0aD0iNTYwIiBoZWlnaHQ9IjMxNSIgc3JjPSJodHRwczovL3d3dy55b3V0dWJlLmNvbS9lbWJlZC83Z1lJczd1WWJNbyIgZnJhbWVib3JkZXI9IjAiIGFsbG93ZnVsbHNjcmVlbj48L2lmcmFtZT4KCkFjY29yZGluZyB0byByZWNlbnQgZWRpdG9yaWFscywgdGhlIHJlcHJvZHVjaWJpbGl0eSBjcmlzaXMgaXMgc3RpbGwgb24tZ29pbmcKCiFbTmF0dXJlLCBNYXkgMjAxNl0oaW1hZ2VzL3JlcC1jcmlzaXMucG5nKQoKCltSZWFsaXR5IGNoZWNrIG9uIHJlcHJvZHVjaWJpbGl0eV0oaHR0cDovL3d3dy5uYXR1cmUuY29tL25ld3MvcmVhbGl0eS1jaGVjay1vbi1yZXByb2R1Y2liaWxpdHktMS4xOTk2MSkKClsxLDUwMCBzY2llbnRpc3RzIGxpZnQgdGhlIGxpZCBvbiByZXByb2R1Y2liaWxpdHldKGh0dHA6Ly93d3cubmF0dXJlLmNvbS9uZXdzLzEtNTAwLXNjaWVudGlzdHMtbGlmdC10aGUtbGlkLW9uLXJlcHJvZHVjaWJpbGl0eS0xLjE5OTcwKQoKCiMjR2V0dGluZyBzdGFydGVkCi0gSW4gdGhpcyBjb3Vyc2Ugd2UgdXNlIGEgc2VydmVyIHRoYXQgaGFzIGV2ZXJ5dGhpbmcgYWxyZWFkeSBpbnN0YWxsZWQgZm9yIHlvdS4KLSBJZiB5b3Ugd2FudCB0byBpbnN0YWxsIG9uIHlvdXIgb3duIGNvbXB1dGVyIHlvdSBjYW4gZ2V0IHRoZSBsYXRlc3QgcmVsZWFzZSBvZiBSIGZyb20gaHR0cHM6Ly93d3cuci1wcm9qZWN0Lm9yZy8KICAgICsgQmFzZSBwYWNrYWdlIGFuZCBDb250cmlidXRlZCBwYWNrYWdlcyAoZ2VuZXJhbCBwdXJwb3NlIGV4dHJhcykKICAgICsgYHIgbGVuZ3RoKFhNTDo6OnJlYWRIVE1MVGFibGUoImh0dHA6Ly9jcmFuLnItcHJvamVjdC5vcmcvd2ViL3BhY2thZ2VzL2F2YWlsYWJsZV9wYWNrYWdlc19ieV9kYXRlLmh0bWwiKVtbMV1dW1syXV0pYCBhdmFpbGFibGUgcGFja2FnZXMgYXMgb2YgYHIgZGF0ZSgpYAotIERvd25sb2FkIGZyb20gaHR0cDovL21pcnJvcnMuZWJpLmFjLnVrL0NSQU4vCi0gV2luZG93cywgTWFjIGFuZCBMaW51eCB2ZXJzaW9ucyBhdmFpbGFibGUKLSBFeGVjdXRlZCB1c2luZyBjb21tYW5kIGxpbmUsIG9yIGEgZ3JhcGhpY2FsIHVzZXIgaW50ZXJmYWNlIChHVUkpCi0gT24gdGhpcyBjb3Vyc2UsIHdlIHVzZSB0aGUgUlN0dWRpbyBHVUkgKHd3dy5yc3R1ZGlvLmNvbSkKCiFbcnN0dWRpb10oaHR0cDovL3d3dy5yc3R1ZGlvLmNvbS93cC1jb250ZW50L3VwbG9hZHMvMjAxNC8wMy9ibHVlLTEyNS5wbmcpIAogICAgCiMjIEludHJvZHVjdGlvbiB0byBSU3R1ZGlvCgoqKGZyb20gU29mdHdhcmUgQ2FycGVudHJ5KSoKClRocm91Z2hvdXQgdGhpcyBsZXNzb24sIHdlJ3JlIGdvaW5nIHRvIHRlYWNoIHlvdSBzb21lIG9mIHRoZSBmdW5kYW1lbnRhbHMgb2YKdGhlIFIgbGFuZ3VhZ2UuIFdlJ2xsIGJlIHVzaW5nIFJTdHVkaW86IGEgZnJlZSwgb3BlbiBzb3VyY2UgUiBpbnRlZ3JhdGVkIGRldmVsb3BtZW50CmVudmlyb25tZW50LiBJdCBwcm92aWRlcyBhIGJ1aWx0IGluIGVkaXRvciwgd29ya3Mgb24gYWxsIHBsYXRmb3JtcyAoaW5jbHVkaW5nCm9uIHNlcnZlcnMpIGFuZCBwcm92aWRlcyBtYW55IGFkdmFudGFnZXMgc3VjaCBhcyBpbnRlZ3JhdGlvbiB3aXRoIHZlcnNpb24KY29udHJvbCBhbmQgcHJvamVjdCBtYW5hZ2VtZW50LgoKCioqQmFzaWMgbGF5b3V0KioKCldoZW4geW91IGZpcnN0IG9wZW4gUlN0dWRpbywgeW91IHdpbGwgYmUgZ3JlZXRlZCBieSB0aHJlZSBwYW5lbHM6CgogICogVGhlIGludGVyYWN0aXZlIFIgY29uc29sZSAoZW50aXJlIGxlZnQpCiAgKiBFbnZpcm9ubWVudC9IaXN0b3J5ICh0YWJiZWQgaW4gdXBwZXIgcmlnaHQpCiAgKiBGaWxlcy9QbG90cy9QYWNrYWdlcy9IZWxwL1ZpZXdlciAodGFiYmVkIGluIGxvd2VyIHJpZ2h0KQoKIVtSU3R1ZGlvIGxheW91dF0oaW1hZ2VzLzAxLXJzdHVkaW8ucG5nKQoKT25jZSB5b3Ugb3BlbiBmaWxlcywgc3VjaCBhcyBSIHNjcmlwdHMsIGFuIGVkaXRvciBwYW5lbCB3aWxsIGFsc28gb3BlbgppbiB0aGUgdG9wIGxlZnQuCgohW1JTdHVkaW8gbGF5b3V0IHdpdGggLlIgZmlsZSBvcGVuXShpbWFnZXMvMDEtcnN0dWRpby1zY3JpcHQucG5nKQoKCiMjIFdvcmtmbG93IHdpdGhpbiBSU3R1ZGlvClRoZXJlIGFyZSB0d28gbWFpbiB3YXlzIG9uZSBjYW4gd29yayB3aXRoaW4gUlN0dWRpby4KCjEuIFRlc3QgYW5kIHBsYXkgd2l0aGluIHRoZSBpbnRlcmFjdGl2ZSBSIGNvbnNvbGUgdGhlbiBjb3B5IGNvZGUgaW50bwphIC5SIGZpbGUgdG8gcnVuIGxhdGVyLgogICAqICBUaGlzIHdvcmtzIHdlbGwgd2hlbiBkb2luZyBzbWFsbCB0ZXN0cyBhbmQgaW5pdGlhbGx5IHN0YXJ0aW5nIG9mZi4KICAgKiAgSXQgcXVpY2tseSBiZWNvbWVzIGxhYm9yaW91cwoyLiBTdGFydCB3cml0aW5nIGluIGFuIC5SIGZpbGUgYW5kIHVzZSBSU3R1ZGlvJ3Mgc2hvcnQgY3V0IGtleXMgZm9yIHRoZSBSdW4gY29tbWFuZAp0byBwdXNoIHRoZSBjdXJyZW50IGxpbmUsIHNlbGVjdGVkIGxpbmVzIG9yIG1vZGlmaWVkIGxpbmVzIHRvIHRoZQppbnRlcmFjdGl2ZSBSIGNvbnNvbGUuCiAgICogVGhpcyBpcyBhIGdyZWF0IHdheSB0byBzdGFydDsgYWxsIHlvdXIgY29kZSBpcyBzYXZlZCBmb3IgbGF0ZXIKICAgKiBZb3Ugd2lsbCBiZSBhYmxlIHRvIHJ1biB0aGUgZmlsZSB5b3UgY3JlYXRlIGZyb20gd2l0aGluIFJTdHVkaW8KICAgb3IgdXNpbmcgUidzIGBzb3VyY2UoKWAgIGZ1bmN0aW9uLgogICAKV2Ugd2lsbCB1c2UgdGhlIHNlY29uZCB3YXkgaW4gdGhpcyBjb3Vyc2UuCgojIyBUaXA6IFJ1bm5pbmcgc2VnbWVudHMgb2YgeW91ciBjb2RlCgpSU3R1ZGlvIG9mZmVycyB5b3UgZ3JlYXQgZmxleGliaWxpdHkgaW4gcnVubmluZyBjb2RlIGZyb20gd2l0aGluIHRoZSBlZGl0b3IKd2luZG93LiBUaGVyZSBhcmUgYnV0dG9ucywgbWVudSBjaG9pY2VzLCBhbmQga2V5Ym9hcmQgc2hvcnRjdXRzLiBUbyBydW4gdGhlCmN1cnJlbnQgbGluZSwgeW91IGNhbiAKMS4gY2xpY2sgb24gdGhlIGBSdW5gIGJ1dHRvbiBhYm92ZSB0aGUgZWRpdG9yIHBhbmVsLCBvciAKMi4gc2VsZWN0ICJSdW4gTGluZXMiIGZyb20gdGhlICJDb2RlIiBtZW51LCBvciAKMy4gaGl0IDxrYmQ+Q3RybDwva2JkPis8a2JkPlJldHVybjwva2JkPiBpbiBXaW5kb3dzIG9yIExpbnV4IApvciA8a2JkPiYjODk4NDs8L2tiZD4rPGtiZD5SZXR1cm48L2tiZD4gb24gT1MgWC4KKFRoaXMgc2hvcnRjdXQgY2FuIGFsc28gYmUgc2VlbiBieSBob3ZlcmluZwp0aGUgbW91c2Ugb3ZlciB0aGUgYnV0dG9uKS4gVG8gcnVuIGEgYmxvY2sgb2YgY29kZSwgc2VsZWN0IGl0IGFuZCB0aGVuIGBSdW5gLgpJZiB5b3UgaGF2ZSBtb2RpZmllZCBhIGxpbmUgb2YgY29kZSB3aXRoaW4gYSBibG9jayBvZiBjb2RlIHlvdSBoYXZlIGp1c3QgcnVuLAp0aGVyZSBpcyBubyBuZWVkIHRvIHJlc2VsY3QgdGhlIHNlY3Rpb24gYW5kIGBSdW5gLCB5b3UgY2FuIHVzZSB0aGUgbmV4dCBidXR0b24KYWxvbmcsIGBSZS1ydW4gdGhlIHByZXZpb3VzIHJlZ2lvbmAuIFRoaXMgd2lsbCBydW4gdGhlIHByZXZpb3VzIGNvZGUgYmxvY2sKaW5jbHVkaW5nIHRoZSBtb2RpZmljYXRpb25zIHlvdSBoYXZlIG1hZGUuCgo=